Training an Arabic LLM that reflects local values

Training an Arabic LLM that reflects local values

Training an Arabic LLM that reflects local values
The Arab world did not play a key role in the PC, internet and mobile eras. In the AI era, it will be different. (Shutterstock)
Short Url

Advances in the large language models that underpin generative AI are changing everything, from medicine and education to entertainment.

Our relationship with technology is becoming more intimate as machines change from passive tools into active assistants that amplify our innate human abilities.

This new era poses both a challenge and an opportunity for the Middle East.

The challenge is that leaders in this new field, like OpenAI’s ChatGPT and Google’s Gemini, come from Silicon Valley, or from China, where my team at 01.AI has built models that rival the Americans. In Europe, too, startups such as France’s Mistral have entered the race.

The opportunity is for the Middle East to join this league and make sure its voice is heard.

Inspired by my latest trip to Riyadh, I decided to test how the current crop of AI models would handle a simple request. I imagined myself as a young Saudi getting ready to host a dinner party and asked ChatGPT to prepare a menu.

The food it recommended sounded delicious — stuffed grape leaves, tabouleh salad, mandi and stuffed dates. But the beverages were a problem.

Aside from drinks such as mint lemonade and jallab, a mixture of dates, grape molasses and rose water, ChatGPT also offered this: “For alcoholic beverages, you could offer a selection of international wines, beers, or non-alcoholic mocktails.”

To its credit, when I repeated the question, it offered only non-alcoholic drinks.

If a model recommends breaking both the law and cultural norms, imagine how it might answer other more sensitive questions about politics or religion? Indeed, researchers have even shown that some models have exhibited an anti-Muslim bias.

My modest test underlines the urgent need to develop an Arabic large language model that reflects local values.

The first step to building this is creating enough high-quality Arabic digitized data to properly train a new generation of models.

Although there are 400 million Arabic speakers, only an estimated 2 percent of online content is in Arabic. Meta’s open source LLM model Llama is overwhelmingly trained on English data, with Arabic comprising less than 0.1 percent of the data.

The lack of data naturally skews the results. To fix this dearth of data, either a visionary entrepreneur or a government-backed organization should collect, digitize and convert the many Arabic books into training data for Arabic models.

Once the data is gathered, it can be fed into the breakthrough pre-training process, which reads trillions of words and creates its own virtual concept space or model of the world. This concept space has been shown to be mostly in English and Chinese.

Adding a sizable number of texts in Arabic, which has enormous cultural output and significance, will make the concept space more knowledgeable about Arabic and more balanced in its concepts and views.

After such pre-training, the model needs to be fine-tuned by data and labels from the Arab world, which will align with the values of the region. Those are different from American models, which are aligned to US values, and Chinese models, which reflect Chinese values.

The collection of alignment data, the coordination of human labeling and the alignment process will need to be done in-region by AI experts.

A new Arabic-enhanced large language model could encourage entrepreneurs and developers to build new applications tailored to the needs of their nations.

Kai-fu Lee

Finally, safety modules will need to be added to ensure legal compliance and to avoid harm. These will also need to be developed locally.

The above steps will create localized, sovereign models that will reflect the traditions of the Middle East. Privately developed or government-backed, it could be the foundation for a new wave of Arabic AI innovation.

A new Arabic-enhanced large language model could encourage entrepreneurs and developers to build new applications tailored to the needs of their nations.

Imagine an AI tool that could find, summarize, organize and write insightful content, an AI teacher that makes learning fun and customized, an AI doctor that is more knowledgeable than any human, an AI engineer that can write software and applications, and an AI assistant that knows its owner better than the owner themselves.

The Arab world did not play a leading role in the PC, internet and mobile eras. In the AI era, it will be different.

This transformation is by no means an easy feat. It will require an unprecedented investment of money, energy and human capital.

Middle Eastern leaders like Saudi Crown Prince Mohammed bin Salman and others have shown that they have the vision, determination and resources to lead their countries into the future.

Standing on my hotel balcony in Jeddah recently, overlooking the King Abdullah University of Science and Technology, I saw part of that vision coming to fruition.

Universities such as KAUST and the Mohamed bin Zayed University of Artificial Intelligence in the UAE are striking examples of the resources that have already been poured into this transformation.

These world-class academic institutions can attract and retain the best top tier global talent.  It is especially important to bring in the world’s best computer engineers to help fulfill this vision of the future AI.

Our team at 01.AI has shown what a group of talented and motivated computer scientists can achieve in just one year. With the right commitment of resources and drawing upon the best talent, countries like Saudi Arabia can easily catch up with their global peers.

The Middle East can also lead the world in the use of renewables to run power-hungry generative AI models.

As it seeks to diversify its economy, Saudi Arabia is actively promoting the use of alternative energy sources such as solar, which could power server farms and reduce their carbon footprint — a growing concern as AI becomes more widespread.

It may take time for countries to figure out their strategy for building a sovereign AI. But it is critical for the Arab world to quickly catalyze the creation of culturally appropriate LLMs and build a rich ecosystem to allow AI-powered Arabic apps to blossom.

A recent encounter with a female sales assistant at a computer store in Riyadh served as an apt reminder of what is at stake. Dressed in jeans and sporting a tattoo, she was a reminder of the transformative changes that the country is undergoing.

Where are you from, I asked. “I’m Saudi,” she said. “One day I want to be Saudi Arabia’s Elon Musk.” I hope on my next visit she will pitch me a homegrown AI app.

Kai-Fu Lee is a computer scientist, CEO of 01.AI, chairman of Sinovation Ventures, former president of Google China, and author of “AI 2041” and “AI Superpowers”
 

Disclaimer: Views expressed by writers in this section are their own and do not necessarily reflect Arab News' point of view

What We Are Reading Today: ‘Snakes of Australia’

What We Are Reading Today: ‘Snakes of Australia’
Updated 19 sec ago
Follow

What We Are Reading Today: ‘Snakes of Australia’

What We Are Reading Today: ‘Snakes of Australia’

Authors: Tie Eipper & Scott Eipper

With more than 1,000 photographs, Snakes of Australia illustrates and describes in detail all 240 of the continent’s species and subspecies—from file snakes, pythons, colubrids, and natricids to elapids, marine elapids, homalopsids, and blind snakes. It features introductions to each family, species descriptions, type locations, distribution maps, and quick-identification keys to each family and genera.

It also covers English and scientific names, appearance, range, ecology, disposition, danger level, and IUCN Red List Category.


Pakistan’s Punjab bans washing cars at home in bid to conserve water

Pakistan’s Punjab bans washing cars at home in bid to conserve water
Updated 26 min 46 sec ago
Follow

Pakistan’s Punjab bans washing cars at home in bid to conserve water

Pakistan’s Punjab bans washing cars at home in bid to conserve water
  • Pakistan high court last Friday issued directives to ban washing cars at homes in Punjab
  • Punjab Environment Agency says will impose fine of Rs10,000 [$35.57] on violators 

ISLAMABAD: The government in Pakistan’s eastern Punjab province on Thursday banned washing cars at home, saying that it would impose a fine of Rs10,000 [$35.75] on violators as it seeks to implement a high court’s earlier directive to conserve water. 

The Environmental Protection Agency Punjab issued the directives in compliance with an order by the Lahore High Court (LHC) last Friday banning the washing of cars at home and directed authorities to consider imposing a fine of $35.57 on violators. 

The high court also directed that filling stations without water treatment plants should be sealed with an initial warning, followed by a fine of Rs100,000 [$357.50]. 

The directives came after the court heard several petitions related to ineffective measures by officials against smog, local media reports said. 

“Ban on the use of water for washing of cars and use of hose pipes in the houses,” a notification from the EPA said. “Anyone found in violation of these directions will be imposed a fine of Rs.10,000.”

The provincial agency also banned oil washing of vehicles, and ordered immediate closure of all illegal/unapproved car wash and service stations in the province in compliance with the court’s orders. 

“Mandatory installation of carwash wastewater recycling system and U-Channels at all Car wash Stations by 28th February, 2025,” the notification said.

“In case the petrol pumps are found to be lacking in their obligations in this regard, fine of Rs. 100,000/- shall be imposed on the defaulting petrol pumps, in addition to sealing of car wash area.”

The notification cited an earlier warning by the Pakistan Meteorological Department (PMD) in which it had highlighted that Punjab had experienced 42 percent below normal rainfall from Sept. 1, 2024, to Jan. 15, 2025. 

The PMD had said that Sindh, Balochistan and Punjab were the most affected provinces where rainfall deficits of 52 percent, 45 percent, and 42 percent, respectively, have been recorded.

Water-stressed Pakistan has a population of 241.49 million people with a growth rate of 2.55 percent. Linked to that, per capita water availability has been on a downward trend for decades. 
In 1947, when Pakistan was created, the figure stood at about 5,000 cubic meters per person, according to the World Bank. Today it is 1,000 cubic meters. 
It will decline further with the population expected to double in the next 50 years, climate change experts say, pointing out that Pakistan needs intervention on a range of water-related issues: from the impact of climate change to hydropower, from transboundary water-sharing to irrigated and rain-fed agriculture, and from drinking water to sanitation.


Saudi Interior Ministry establishes General Department for Community Security and Combating Human Trafficking Crimes

Saudi Interior Ministry establishes General Department for Community Security and Combating Human Trafficking Crimes
Updated 33 min 45 sec ago
Follow

Saudi Interior Ministry establishes General Department for Community Security and Combating Human Trafficking Crimes

Saudi Interior Ministry establishes General Department for Community Security and Combating Human Trafficking Crimes
  • Department was set up after a directive from Crown Prince Mohammed bin Salman
  • It aims to eliminate crimes by dismantling criminal networks in coordination with local and international authorities

RIYADH: Saudi Arabia’s Ministry of Interior established the General Department for Community Security and Combating Human Trafficking Crimes on Thursday to further ensure the public safety.

The newly established body will be linked to the General Directorate of Public Security, following a directive from Crown Prince Mohammed bin Salman, the Saudi Press Agency reported.

The ministry said that the department aims to combat crimes that infringe on personal rights, violate fundamental freedoms under Islamic Shariah laws, or undermine individual dignity.

It also aims to eliminate crimes by dismantling criminal networks in coordination with local and international authorities, the SPA added.


Israel asserts presence in five strategically significant high points in southern Lebanon

Israel asserts presence in five strategically significant high points in southern Lebanon
Updated 30 min 57 sec ago
Follow

Israel asserts presence in five strategically significant high points in southern Lebanon

Israel asserts presence in five strategically significant high points in southern Lebanon
  • Lebanon rejects any extension of Israeli forces’ presence in the border areas
  • Egyptian foreign minister: Resolution 1701 must be implemented by all parties

BEIRUT: Ahead of the scheduled Friday meeting of the five-member committee overseeing the implementation of the ceasefire agreement in southern Lebanon, Israel preemptively announced its decision to maintain a military presence in five strategic points overlooking the southern sectors.

The Israeli announcement — through both its officials and Israeli media — came four days before the extended deadline for withdrawing its forces, which have advanced into Lebanese territory.

On Wednesday night, Israeli warplanes conducted low-altitude flights, breaking the sound barrier over Beirut and several other regions, including the Bekaa Valley.

The maneuver came only hours after Lebanon rejected any extension of Israeli forces’ presence in the border areas, which they had advanced into since Oct. 1.

Political analysts interpreted the aerial incursion as “an act of intimidation designed to pressure Lebanon into accepting the situation.”

Lebanon has rejected any extension of the Israeli occupation of its territory. On Thursday, President Joseph Aoun reaffirmed that “Lebanon is intensifying diplomatic efforts to ensure Israel’s withdrawal by February 18.”

He said that the country was actively engaging with influential global powers, particularly the US and France, to secure a sustainable resolution.

During his meeting with Lebanese Foreign Minister Youssef Rajji in the newly formed government, Egyptian Foreign Minister Badr Abdelatty underscored the need to enforce the ceasefire agreement in southern Lebanon and demanded the immediate, full withdrawal of Israeli forces. He also stressed the importance of enforcing Resolution 1701, ensuring that all parties complied without exception.

On Thursday, Israeli Strategic Affairs Minister Ron Dermer announced that Israel would retain control over five strategic high points inside Lebanon following the expiry of the ceasefire next Tuesday. He emphasized that while the Israeli army would redeploy, it would maintain its presence in these key positions until Lebanon met its commitments under the agreement.

“Lebanon’s obligations do not entail removing Hezbollah from the border, but rather disarming it,” Dermer told Bloomberg.

While the Israeli minister did not specify how long the Israeli army would remain in the strategic high points, he said: “The army will not withdraw in the near future.”

On Wednesday, Ori Gordin, the chief of the Israeli army’s Northern Command, made a call “to solidify Israel’s presence in these positions under American cover and with international support.”

The Israeli Broadcasting Authority quoted senior officials in the Security Cabinet of Israel as saying that “the US has granted Israeli forces permission to remain in several locations in Lebanon long-term beyond Feb. 18.”

Israeli media reported that “the Israeli army has received US approval to establish observation points to monitor Hezbollah’s activities, while the US side rejected postponing the Israeli withdrawal from the villages where it is still carrying out incursions.”

These Israeli positions coincided with a round of talks conducted by US Maj. Gen. Jasper Jeffers, representative of the US in the committee monitoring the implementation of the ceasefire agreement, with Israeli officials on Thursday. As a result, the committee’s meeting in Ras Naqoura was postponed to Friday after originally being scheduled for Thursday.

Lebanon has rejected a joint US-French proposal to take control of these five strategic positions along the border, insisting instead that UN peacekeeping forces — UNIFIL — assume control of these points in coordination with the Lebanese army.

The disputed hills, which the Israeli military refuses to evacuate, include Jabal Blat, Labouneh, Aziziyah, Awida and Hamames. All these positions are strategically located but uninhabited.

According to local media reports in Beirut, Israeli forces have begun constructing prefabricated structures with guard posts along the Markaba-Houla road, adjacent to an existing UNIFIL position near the border. 


Saudi counter-narcotics authorities thwart drug smuggling attempts in several regions

Saudi counter-narcotics authorities thwart drug smuggling attempts in several regions
Updated 13 February 2025
Follow

Saudi counter-narcotics authorities thwart drug smuggling attempts in several regions

Saudi counter-narcotics authorities thwart drug smuggling attempts in several regions
  • Border Guard seized 100 kg of hallucinogenic khat leaves, valued at approximately $5,300
  • In Jizan, authorities arrested a gang of Yemeni nationals for smuggling khat

RIYADH: Saudi Arabia’s counter-narcotics authorities thwarted drug smuggling and dealing attempts, including of hashish, khat, and methamphetamine, in various cities on Thursday.

The General Directorate of Border Guard arrested an Ethiopian citizen for attempting to smuggle 49,350 tablets of unlicensed medical pills in the Red Sea region of Asir, located in southwest Saudi Arabia.

The Border Guard thwarted another smuggling attempt in Asir, which borders Yemen, where they seized 100 kg of hallucinogenic khat leaves, valued at approximately $5,300.

In Jizan, located in the southwest of the Kingdom, authorities arrested a gang of Yemeni nationals for smuggling 5.5 kg of hashish and 30 kg of khat; in another incident, they seized 108 kg of khat.

In Dammam in the eastern region, the General Directorate of Mujahideen arrested a Saudi citizen for selling the toxic methamphetamine drug known locally as Al-Shabu.

Drug smuggling is a serious crime in Saudi Arabia. It is punishable by up to 15 years in prison, along with 50 lashes and a fine for first-time offenders involved in smuggling, consuming or marketing drugs. However, individuals who repeatedly commit this crime may face the death penalty, according to the General Directorate of Narcotics Control.

Security authorities urged the public to report drug smuggling or selling by calling 911 in Makkah, Riyadh and the Eastern Province, or 999 in other regions.

Reports can also be made to the General Directorate of Narcotics Control at 995 or via email at [email protected]. All information will remain strictly confidential.